Search CORE

5 research outputs found

Exploring Restart Distributions

Author: Islam Riashat
Kormushev Petar
Levdik Vitaly
Smith Christopher M.
Tavakoli Arash
Publication venue
Publication date: 01/07/2019
Field of study

We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Development of an open technology sensor suite for assisted living: a student-led research project.

Author: Amjad Omar A
Baekelandt Géraldine
Bonner Oliver
Hadeler Oliver
Hall Richard D
Hughes Josephine AE
Hutter Tanya
Kaminski Clemens F
Levdik Vitaly
Mair Philip
Manton James D
Miele Isabella
Vasconcellos Fernando da Cruz
Wang Tiesheng
Publication venue: Interface Focus
Publication date: 17/06/2016
Field of study

Many countries have a rapidly ageing population, placing strain on health services and creating a growing market for assistive technology for older people. We have, through a student-led, 12-week project for 10 students from a variety of science and engineering backgrounds, developed an integrated sensor system to enable older people, or those at risk, to live independently in their own homes for longer, while providing reassurance for their family and carers. We provide details on the design procedure and performance of our sensor system and the management and execution of a short-term, student-led research project. Detailed information on the design and use of our devices, including a door sensor, power monitor, fall detector, general in-house sensor unit and easy-to-use location-aware communications device, is given, with our open designs being contrasted with closed proprietary systems. A case study is presented for the use of our devices in a real-world context, along with a comparison with commercially available systems. We discuss how the system could lead to improvements in the quality of life of older users and increase the effectiveness of their associated care network. We reflect on how recent developments in open source technology and rapid prototyping increase the scope and potential for the development of powerful sensor systems and, finally, conclude with a student perspective on this team effort and highlight learning outcomes, arguing that open technologies will revolutionize the way in which technology will be deployed in academic research in the future.This is the final version of the article. It first appeared from Royal Society Publishing via http://dx.doi.org/10.1098/rsfs.2016.001

PubMed Central

Apollo (Cambridge)

Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks

Author: Kormushev Petar
Levdik Vitaly
Pardo Fabio
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in ε-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games

Association for the Advancement of Artificial Intelligence: AAAI Publications